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Abstract 

Java  implementations  of  algorithms  used  by  spreadsheets  to  au¬ 
tomatically  recompute  the  set  of  cells  dependent  on  a  changed  cell 
are  described  using  a  mathematical  model  for  spreadsheets  based  on 
graph  theory.  These  solutions  comprise  part  of  a  Java  API  that  allows 
a  client  application  to  read,  modify,  and  maintain  spreadsheet  data 
without  using  the  spreadsheet  application  program  that  produced  it. 
Features  of  the  Java  language  that  successfully  improve  the  running 
time  performance  of  the  algorithms  are  also  described. 


1  Introduction 

This  paper  describes  algorithms  for  the  recomputation  of  spreadsheet  cells. 
The  assumed  context  for  such  a  recomputation  occurs  when  a  cell’s  value  is 
changed.  In  general,  a  cell  is  dependent  on  several  others  for  its  value  as 
dehned  by  its  formula.  Thus,  to  maintain  the  integrity  of  the  spreadsheet, 
the  reading  of  a  cell  value  requires  the  recomputation  of  this  cell  once  any  of 
the  cells  on  which  it  depends  has  changed. 

The  algorithms  of  this  paper  form  the  basis  of  ExcelComp  [9],  a  Java 
[24]  application  program  interface  (API)  written  by  the  author  that  al¬ 
lows  the  client  application  to  read  a  specially  formatted  Microsoft  Excel  [4] 

*This  work  was  supported  by  US  Army  CECOM  Contract  DAAB07-01-C-C201,  and 
SPAWAR  PMW  176-1. 
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(henceforth  referred  to  as  “Excel”)  spreadsheet  output  hie,  and  then  make 
changes  to  cell  values  within  the  ExcelComp  representation  of  this  spread¬ 
sheet.  Changes  to  cell  values  are  followed  by  the  automatic  recomputation  of 
dependent  cell  values  using  ExcelComp  methods.  ExcelComp  thus  allows  the 
client  programmer  to  provide  its  users  with  both  the  data  and  behavior  of  an 
existing  spreadsheet  without  the  use  of  the  original  spreadsheet  application 
program  that  produced  it. 

During  the  development  of  ExcelComp,  it  was  realized  that  choices  of 
algorithms  to  perform  cell  recomputation  involve  two  principal  trade-offs:  1) 
ease  of  use,  and  2)  running  time  performance.  On  the  one  hand,  one  mode  of 
ExcelComp  can  simply  load  a  hie  at  run  time  that  represents  the  spreadsheet, 
and  then  provide  its  services.  While  this  mode  is  satisfactory  for  many  tasks, 
it  is  unsuitable  for  those  that  require  a  large  number  of  cell  recomputations 
to  support  dynamic  updates  to  real-time  outputs,  for  example,  the  updating 
of  an  on-screen  map  that  depends  on  thousands  of  cell  recomputations.  To 
support  this  latter  task,  a  second  mode  was  developed  that  allows  faster  cell 
recomputation  at  the  expense  of  a  less  convenient  installation  procedure  for 
the  spreadsheet  representation. 

These  considerations  make  ExcelComp  an  efficient,  platform-  and  vendor- 
independent  Java  API  that  provides  built-in  spreadsheet  emulation  for  ap¬ 
plication  end-users.  In  particular,  end-users  are  relieved  of  the  burden  of 
conducting  their  spreadsheet  tasks  outside  the  domain  of  their  running  ap¬ 
plication.  In  addition  to  the  efficiency  won  by  executing  spreadsheet  tasks 
natively,  ExcelComp  also  obviates  the  need  for  costly  additional  licenses  re¬ 
quired  for  multiple  users  of  the  application  software  that  produced  the  spread¬ 
sheet.  Being  written  in  the  modern  Java  programming  language  allows  the 
client  programmer  to  easily  integrate  ExcelComp’s  functionality  into  current 
software  development  efforts. 

While  other  descriptions  of  spreadsheet  algorithms  are  available  [16]  [17], 
this  paper  is  distinctive  in  its  use  of  graph  theory  to  improve  the  reader’s 
ability  to  visualize  the  algorithms,  and  to  provide  a  basis  for  a  proof  of 
algorithm  correctness.  It  also  presents  solutions  that  leverage  features  in  the 
object-oriented  Java  API  that  lead  to  succinct,  yet  powerful  code. 

This  paper  focuses  on  the  subject  algorithms  and  the  specihc  features  of 
the  Java  language  used  by  ExcelComp  that  are  well-suited  for  their  imple¬ 
mentation.  Readers  interested  in  a  more  detailed  specihcation  of  ExcelComp 
from  the  client  programmer’s  perspective  may  contact  the  author. 
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2  A  Scenario 


Before  getting  into  the  technical  details  that  comprise  this  report,  it  would 
be  helpful  to  consider  a  motivational  scenario. 

Consider  a  spreadsheet  of  hnancial  data,  where  subtotals,  interest  earned, 
and  a  grand  total  might  be  some  examples  of  computed  quantities  that  each 
depend  on  entries  in  several  cells.  Analysts  may  use  such  a  spreadsheet  to 
play  “what-if”  games  by  varying  values  in  cells  that  will  affect  some  target 
cell,  such  as  interest  earned.  The  spreadsheet  program  would  then  auto¬ 
matically  recompute  all  cells  that  are  dependent  on  the  ones  changed.  This 
capability  of  a  spreadsheet  is  its  hallmark,  and  distinguishes  it  from  a  simple 
table  of  values  that  have  no  computational  relationship  to  one  another. 

Suppose  that  a  computer  program  needs  this  spreadsheet  of  information 
and  auto-update  capability  to  carry  out  its  tasks.  This  program  is  to  provide 
its  users  with  the  what-if  capability,  and  therefore  requires  not  only  cell 
values,  but  also  cell  formulas.  Since  it  needs  to  emulate  the  recomputation 
function  of  the  spreadsheet,  it  must  implement  algorithms  that  return  the 
same  recomputed  values  as  the  spreadsheet  program.  It  is  these  algorithms 
of  dependent  cell  recomputation  that  are  the  subject  of  this  report. 

3  Modes 

ExcelComp  has  two  modes  of  operation: 

Interpreted  mode  requires  the  reading  of  an  extensible  Markup  Language 
(XML)  [2]  representation  of  a  spreadsheet.  Once  this  hie  is  parsed  and 
loaded  into  ExcelComp’s  data  structures,  the  subject  algorithms  are 
implemented  via  ExcelComp  methods.  It  is  called  interpreted,  because 
cell  formulas  are  interpreted  at  run  time  using  a  custom  parser  that 
recognizes  a  subset  of  Excel’s  formula  language. 

Compiled  mode  uses  cell-specihc  Java  classes,  created  as  an  offline  pre¬ 
processing  task,  to  evaluate  a  cell  by  recursively  evaluating  each  child 
cell  referenced  in  its  formula’s  parse  tree. 

The  interpreted  mode  is  the  slowest  of  the  two.  It  has  the  advantage,  how¬ 
ever,  of  requiring  less  preprocessing,  namely  just  the  creation  of  the  XML 
input  hie.  During  the  development  of  ExcelComp,  this  XML  hie  was  pro¬ 
duced  by  running  an  Excel  macro  [8].  It  is  also  easier  in  this  mode  for  the 
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client  programmer  to  provide  the  application  user  the  flexibility  to  apply  Ex- 
celComp  to  a  different  spreadsheet  by  simply  changing  a  filename  reference 
in  ExcelComp’s  constructor.  The  comprehensive  update  of  all  dependent  cell 
values  upon  the  change  of  a  constant  cell  in  this  mode  allows  the  ExcelComp 
user  to  highlight  the  newly  computed  dependents.  Such  an  application  was 
developed  by  the  author,  where  changed  cells  are  shown  in  a  JTable  [22]  with 
changed  values  highlighted  in  red  to  allow  the  user  to  gain  insight  into  the 
impact  of  the  change  of  a  cell  value. 

The  compiled  mode  is  much  faster  than  the  interpreted  mode,  and  should 
be  preferred  in  cases  where  a  client  demands  exceptionally  high  execution 
time  performance,  e.g.,  providing  a  real-time  screen  update  that  depends 
on  the  recomputations.  The  preprocessing  needed  for  compiled  mode  in¬ 
cludes  the  generation  of  Java  source  code  that  implements  the  formulas  of 
the  spreadsheet.  This  source  code  generation  was  automated  by  using  a  Java 
class  that  uses  classes  produced  by  parser  generators  [1,  14]  to  implement  a 
custom  parser  for  a  subset  of  the  Excel  formula  language.  In  general,  Java 
classes  that  represent  each  cell  formula  in  the  spreadsheet  must  be  created, 
and  then  be  referenced  by  the  classpath  option  for  the  Java  virtual  machine 
(JVM)  [18]. 

4  Computation  Model 

To  provide  a  lingua  franca  for  the  discussion  to  follow,  we  need  to  identify 
parts  of  a  spreadsheet  that  are  useful  to  us.  While  the  intent  is  to  have  a 
model  that  is  generic,  platform-,  and  vendor-independent,  the  use  of  Excel 
as  a  reference  implementation  for  ExcelComp  influenced  the  latter’s  design. 
The  language  of  this  paper  will  be  similarly  influenced,  however  it  is  germane 
to  any  spreadsheet  that  adheres  to  the  computation  model  described  here. 
While  a  queue-based  computation  model  has  been  successfully  developed 
[16],  we  will  And  it  advantageous  to  develop  a  model  using  graph  theory. 
In  particular,  proof  of  correctness  of  the  algorithms  can  benefit  from  such  a 
treatment. 

There  are  many  ways  to  present  data  using  a  spreadsheet.  For  example, 
two  principal  classes  of  representation  provided  by  Excel  are  the  workbook, 
and  the  chart.  We  consider  only  the  tabular  computation  environment  found 
in  a  workbook.  The  term  spreadsheet  will  thus  be  used  as  a  synonym  for 
workbook. 
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A  spreadsheet  is  a  finite  set  of  cells  arranged  as  a  matrix.  A  cell  is  a 
set  that  contains  three  elements  of  interest: 

1.  a  value, 

2.  a  formula,  and 

3.  a  cell  reference. 

A  cell’s  value  is  the  result  of  the  computation  specified  by  its  formula. 
In  general,  this  value  may  be  a  real  number,  a  string,  or  some  other  data 
type.  To  simplify  this  model,  we  will  assume  that  these  values  are  real.  A 
cell’s  formula  is  an  expression  that  defines  a  cell’s  value  as  a  function,  /,  of 
a  subset  of  the  spreadsheet’s  cell  values.  Let  C  denote  the  set  of  cell  values 
for  some  spreadsheet.  More  formally  then,  we  have 

(1) 

where  C”  is  the  n-fold  Cartesian  product  of  C  for  some  positive  integer  n. 
For  this  model,  we  define  C  =  M.  In  general,  a  formula  expresses  a  composite 
of  functions  in  the  form  (1).  A  formula  that  is  not  composite  has  constant 
values  for  its  arguments.  Such  a  formula  is  termed  a  constant  formula. 

A  cell  reference  is  an  ordered  pair  that  specifies  a  cell  uniquely  within 
the  spreadsheet.  The  Excel  “Al  reference  style”  [5]  will  be  used,  where  the 
first  element  specifies  the  column,  and  the  second  element  specifies  the  row. 
For  example,  B3  designates  the  cell  at  the  intersection  of  the  second  column 
and  the  third  row.  We  use  Xi  to  denote  a  variable  whose  value  is  a  cell 
reference.  In  the  context  of  a  formula,  a  cell  reference  is  mapped  to  its 
corresponding  cell  value  according  to  (1).  We  thus  see  that,  in  general,  a  cell 
reference  refers  to  a  composite  function  that  is  defined  by  the  formula  for 
that  cell.  For  example,  if  cell  Ml  depends  on  N1  and  P2,  and  N1  depends  on 
Q3,  and  P2  depends  on  Rl,  then  the  value  of  Ml  expressed  as  a  composite 
function  is  M1(N1(Q3),P2(R1)).  For  this  expression  to  be  fully  resolved,  the 
formulas  for  both  Q3  and  Rl  must  be  constant  formulas. 

To  successfully  develop  the  subject  algorithms,  the  scope  of  the  set  of 
spreadsheets  to  be  considered  must  be  defined.  This  will  be  done  by  identi¬ 
fying  properties  that  serve  as  axioms  for  the  spreadsheets  of  interest.  Spread¬ 
sheets  that  satisfy  the  stated  properties  are  termed  admissible. 

There  is  a  cohesive  relationship  between  a  cell’s  formula  and  its  value,  as 
described  in  the  following  property. 
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Property  1.  A  cell’s  value  is  completely  determined  recursively  by  its  for¬ 
mula. 

To  be  clear,  Property  1  states  that  a  cell’s  value  depends  only  on  its 
formula,  but  that  formula  in  general  depends  on  other  cell  values  that  are,  in 
turn,  dependent  only  on  their  formulas.  This  recursion  ends  when  a  cell  with 
a  constant  formula  is  reached,  thus  resolving  all  of  the  recursive  cell  value 
references. 

The  formulas  of  a  spreadsheet  dehne  a  potentially  involved  relation  among 
its  cells.  Given  cell  Xi,  its  formula  may  be  a  constant  formula,  or  a  non¬ 
constant  formula  that  dehnes  Xi’s  value  as  a  function  of  other  cell  values  in 
the  spreadsheet.  In  the  latter  case,  we  say  that  Xi  is  dependent  on  the  cells 
referenced  in  its  formula.  That  is,  the  value  of  Xi  depends  on  the  values  of 
the  cells  referenced  in  its  formula.  The  term  “value”  will  often  be  omitted 
when  the  context  of  “dependent”  is  clear.  We  also  refer  to  each  cell  referenced 
in  Xi’s  formula  as  a  child  of  Xi.  Similarly,  Xi  is  a  parent  of  its  children.  A 
parent  and  any  parent  of  a  parent  is  termed  an  ancestor.  A  child  and  any 
child  of  a  child  is  termed  a  descendant. 

The  relation  among  the  dependent  cells  in  a  spreadsheet  may  be  repre¬ 
sented  as  a  weakly  connected  directed  graph,  G(V,  E),  where  V  is  the  set  of 
vertices,  and  E  is  the  set  of  edges.  Figure  1  illustrates  this  spreadsheet  de¬ 
pendency  graph  with  an  example  that  will  be  used  throughout  this  paper. 


Figure  1:  Example  of  a  Spreadsheet  Dependency  Graph 

Each  vertex  of  the  graph  is  a  cell  represented  as  a  box.  The  cell  refer¬ 
ence  is  given  at  the  top  of  each  cell  box,  and  its  corresponding  formula  is 
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given  in  smaller  type  at  the  bottom  of  the  box.  It  is  understood  that  the 
value  of  a  cell  is  assigned  the  value  computed  by  its  formula.  The  cell  values 
are  omitted  from  the  cell  boxes  in  Figure  1  for  brevity.  Note  that  Cl  has 
two  parents:  El  and  FI.  This  lack  of  a  unique  parent  in  general  for  each 
cell  precludes  regarding  this  structure  as  a  rooted  tree.  Although  the  more 
general  graph  is  the  appropriate  data  structure  for  representing  cell  depen¬ 
dencies  in  a  spreadsheet,  we  will  End  that  rooted  trees  will  also  be  useful  in 
the  algorithms  to  be  described. 

In  describing  Figure  1,  some  of  the  basics  of  graph  theory,  using  [6]  as  a 
guide,  will  be  described  as  needed. 

In  Figure  1,  the  set  of  vertices  V  are  the  cells,  and  the  set  of  directed 
edges  E  is  defined  according  to  the  parent/child  relationships.  Each  edge  is 
directed  from  a  parent  to  a  child.  The  set  E  is  a  subset  of  the  ordered  pairs  of 
V.  Let  a  sequence  of  vertices  be  ordered  such  that  Vi^i  is  a  parent  of  Vi,  and 
let  Vi-iVi  denote  the  edge  directed  from  nj_i  to  Uj.  A  sequence  of  edges  and 
vertices  that  has  the  form  {vqVi,viV2i  ■  ■  ■  ,Vn-iVn\  for  distinct  Vi  is  defined 
as  a  directed  path  linking  vq  and  Vn-  A  directed  path  is  a  directed  cycle 
if  it  consists  of  2  or  more  vertices,  and  Vn  =  uo- 

We  are  now  led  to  an  important  stipulation  concerning  spreadsheets. 

Property  2.  An  admissible  spreadsheet  contains  no  directed  cycles. 

The  Excel  term  for  directed  cycle  is  circular  reference.  Property  2  thus 
states  that  circular  references  are  prohibited. 

A  graph  from  the  subset  of  graphs  just  described  is  termed  a  directed 
acyclic  graph  or  dag  [3,  section  B.4]. 

Readers  familiar  with  the  GNU  Make  tool  [28]  will  recognize  this  depen¬ 
dent  cell  recomputation  problem  as  being  analogous  to  the  problem  solved  by 
Make:  automatic  determination  of  the  pieces  of  code  that  require  recompila¬ 
tion,  and  issuing  the  appropriate  commands  to  bring  the  program  up  to  date. 
Make  uses  a  dependency  graph  model.  See  [26]  for  details  and  illustrations 
of  Make’s  dependency  graphs,  including  a  description  of  pitfalls  concerning 
the  proper  use  of  Make  to  ensure  correct  dependency  graph  construction  in 
large  projects. 
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5  Interpreted  mode 

This  section  describes  those  algorithms  that  are  implemented  in  the  inter¬ 
preted  mode  of  ExcelComp.  The  integrity  of  the  spreadsheet  is  preserved 
in  this  mode  by  recomputing  all  dependent  cells  of  a  cell  whose  constant 
formula  (value)  has  changed.  This  behavior  ensures  that,  upon  the  commit¬ 
ment  of  a  new  value  to  a  cell,  the  entire  spreadsheet  will  be  updated  to  reflect 
the  change.  This  matches  the  default  behavior  of  Excel,  where  it  is  termed 
automatic  calculation.  Pseudocode  is  given  in  this  section  to  highlight  the 
salient  features  of  the  ExcelComp  interpreted  mode  algorithms;  the  actual 
code  differs  in  some  of  the  implementation  details. 

5.1  Dependency  Set  Generation 

Suppose  that  we  are  examining  a  spreadsheet  for  the  first  time,  and  have 
no  a  priori  knowledge  of  its  contents.  Say  we  want  to  modify  cell  Al.  By 
this,  we  mean  that  Al  has  a  constant  formula  that  is  to  be  changed  to 
another  constant  formula.  The  more  general  act  of  modifying  or  adding  a 
non-constant  formula  will  not  be  discussed  here;  it  is  assumed  that  non¬ 
constant  formulas  remain  fixed  throughout  our  analysis. 

Consider  the  impact  that  this  change  has  on  the  cells  that  are  dependent 
on  Al  in  Figure  1.  First,  this  change  in  the  formula  causes  a  recomputation 
of  Al’s  value.  This  change  will,  in  general,  affect  the  values  of  all  cells  whose 
formulas  reference  Al.  These  cells,  B1  and  Cl,  are  directly  dependent 
on  Al,  and  will  need  to  be  recomputed  as  a  result.  In  general,  the  values 
of  these  direct  dependents  will  change  as  a  result  of  recomputation.  These 
direct  dependents  must  then  be  considered  in  the  same  light  as  Al;  that  is, 
we  need  to  find  and  recompute  the  direct  dependents  of  the  direct  dependents 
of  Al.  These  cells  are  El  and  El.  From  the  point  of  view  of  Al,  these  latter 
cells  are  indirect  dependents  of  Al. 

The  algorithm  for  discovering  the  set  of  dependent  cells  of  a  given  cell  is 
thus  recursive.  Let  d  be  a  set-valued  function  d  :  2^  ^  2^  that  computes 
the  set  of  direct  dependents  of  a  subset  of  cells  from  C.  (Here,  2^  denotes 
the  set  of  all  subsets  of  C,  also  known  as  the  power  set.)  The  procedure  just 
described  can  now  be  expressed  as 


Aj+i  —  d(Aj),  Ai  G  2^, 


Aq  —  {Al}. 


(2) 


Aq  is  set  to  {Al}  in  (2)  to  reflect  Figure  1,  but  in  general  it  will  be 
assigned  to  the  set  containing  the  cells  that  were  changed. 

Dependency  Set  Generation  may  be  recognized  as  an  implementation  of 
the  breadth-first  search  (BFS)  algorithm  for  graphs  described  in  [3].  The 
“frontier  between  discovered  and  undiscovered  vertices”  described  in  [3]  ap¬ 
plied  here  divides  two  generations  of  dependencies,  i.e.,  child/parent,  par¬ 
ent/grandparent,  etc.  Also,  note  that  we  begin  with  a  child,  and  then  dis¬ 
cover  ancestors,  in  reverse  to  the  naming  convention  used  in  [3]. 

For  recurrence  relation  (2)  to  be  practical,  we  must  be  assured  that  it 
terminates.  Indeed,  an  essential  property  of  an  algorithm  is  its  hniteness; 
according  to  [15], “An  algorithm  must  always  terminate  after  a  hnite  number 
of  steps.”  This  assurance  is  given  now  as  a  theorem. 

Theorem  1.  Recurrence  relation  (2)  terminates. 

Proof.  Because  the  indices  of  Aj  in  the  recursion  are  strictly  increasing,  it  is 
sufficient  to  show  that  ^imax  3  f  <  imax- 

Assume  that  the  spreadsheet  under  consideration  has  N  cells.  Let  ||Aj|| 
denote  the  number  of  parents  of  Aj_i.  Because  of  Property  2,  the  number 
of  candidate  parents  for  Aq  is  A^  —  1,  since  a  cell  cannot  be  a  parent  of  itself 
(thereby  creating  a  circular  reference).  Similarly,  the  number  of  available 
parents  for  Ai  is  at  most  N  —  2,  since  both  the  children  and  grandchildren 
of  A2  must  be  excluded  to  avoid  circular  references.  In  general  then, 

||A,||  <N-t, 

and  in  particular,  ||AAr||  =  0.  At  this  point,  no  parents  are  available  to 
continue  further.  We  have  thus  shown  that  i  <  N]  that  is,  imax  =  N.  □ 

The  hnal  product  of  Dependency  Set  Generation  is  formed  by  taking  the 
union  of  the  sets  of  dependent  cells  found  in  (2).  Assuming  that  the  final 
index  computed  in  (2)  is  n,  and  letting  V  be  the  set  of  dependent  cells,  we 
have 


V  =  \JA,  a,  €2^  (3) 

i=l 
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Input:  Set  of  cells  for  which  dependent  cells  will  be  found 
Output :  Set  of  cells  that  are  dependent  on  the  input 
set  of  cells 
DepSetGen (depSet , m) 

initial_size  =  depSet . size () ; 
for  (k=m  through  initial_size-l;k++)  { 
for  (j=0  through  SPRSHEET_SIZE-1 ; j++)  { 

//  Find  all  direct  dependents  of  depSet [k] . 

if  (sprsheet [j] . formula  contains  depSet [k] .ref )  { 
depSet . add(sprsheet [j] .ref); 

} 

} 

DepSetGen(depSet , initial_size) ; 

} 

if  (m  ==  0)  {. 

depSet . delete (depSet  [0] ) ; 

} 

> _ 

Figure  2:  Dependency  Set  Generation  Algorithm 


5.1.1  Example 

The  Dependency  Set  Generation  algorithm  is  codihed  in  Figure  2.  Let  us 
apply  this  algorithm  to  hnding  all  dependents  of  Al  in  Figure  1. 

Assume  the  number  of  cells  in  the  spreadsheet,  SPRSHEET_SIZE,  is  6. 
The  array  sprsheet  holds  all  the  cells  in  the  spreadsheet.  Each  element 
of  sprsheet  has  the  fields  ref  and  formula  for  cell  reference  and  formula, 
respectively.  The  array  depSet  will  be  built  up  to  contain  the  cell  references 
of  all  of  the  dependents  of  its  initial  value. 

We  make  the  initial  call  to  DepSetGen  with  depSet  initialized  to  contain 
Al,  and  the  depSet  element  marker  m  set  to  0.  This  marker’s  value  is  the 
index  of  the  cell  in  array  depSet  whose  dependents  are  sought.  Variable 
initial_size  is  set  to  the  number  of  elements  in  depSet.  Since  depSet 
contains  only  Al,  initial_size  =  1.  Loop  counter  k  ranges  from  0  through 
0.  The  inner  loop  checks  to  see  whether  any  spreadsheet  cell  formula  contains 
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Al.  If  it  does,  the  cell  reference  is  added  to  depSet.  At  the  time  where 
the  inner  loop  is  finished,  both  B1  and  Cl  are  appended  to  depSet.  At 
the  bottom  of  the  outer  loop,  it  is  time  to  make  the  first  recursive  call  to 
DepSetGen. 

The  actual  parameters  passed  to  DepSetGen  are  the  newly  updated  3- 
element  depSet,  and  the  initial  size  of  depSet  before  the  loops,  1.  Now 
entering  the  hrst  recursive  call  of  DepSetGen,  m  is  1,  initial_size  is  3, 
depSet  consists  of  Al,  Bl,  and  Cl.  Loop  variable  k  ranges  from  1  through 
2.  The  first  time  through  the  inner  loop  hnds  all  dependents  of  Bl,  and  adds 
the  one  dependent  found,  El,  to  depSet.  When  k  is  2,  the  inner  loop  finds 
all  dependents  of  Cl,  and  adds  the  one  dependent  found,  FI,  to  depSet. 

Upon  the  next  recursive  call,  the  marker  is  set  to  the  next  unexamined 
element  of  depSet,  El  at  index  3.  No  dependents  are  found.  Similarly  for 
El  at  index  4.  Finally  the  outer  loop  is  skipped,  and  control  is  eventually 
returned  to  the  original  call  of  DepSetGen,  where  m  is  0.  Lastly,  the  if 
statement  is  executed,  and  the  initial  element  Al  is  removed  from  depSet, 
since  Al  is  not  dependent  on  itself.  It  is  assumed  depSet  contains  just  one 
cell  during  the  initial  call  to  DepSetGen.  Though  not  shown  in  Figure  2,  an 
additional  step  of  removing  duplicate  cell  references  from  V  is  required  to 
ensure  that  all  of  its  elements  are  unique. 

To  conclude  this  section,  it  will  be  shown  that  the  Dependency  Set  Gen¬ 
eration  algorithm  just  described  indeed  hnds  all  dependents  of  a  given  cell. 

Theorem  2.  The  Dependency  Set  Generation  algorithm  identifies  all  depen¬ 
dents  of  its  input  set  of  cells. 

Proof.  The  proof  is  by  contradiction.  Assume  we  have  a  spreadsheet  with 
cells  Ci,  i  =  0,1,2, .. .  ,  N  —  1.  Suppose  3Ck  G  C  that  is  dependent  on  Cq,  but 
was  not  identihed  by  the  Dependency  Set  Generation  algorithm.  Then  by 
Property  1,  this  dependence  of  Ck  on  Cq  must  be  due  only  to  C^’s  recursive 
formula.  There  must  then  be  a  directed  path  in  Cq’s  dependency  graph  from 
Ck  to  Cq.  Since  Ck  was  not  identihed,  there  is  at  least  one  cell  Cj  in  this 
path  that  was  not  found  during  the  recursion.  But  this  contradicts  the  step 
in  the  algorithm  that  says  to  hnd  all  direct  dependents  of  Cj-i.  □ 

5.2  Recomputation  of  Dependency  Set  Members 

Once  the  dependency  set  V  has  been  generated,  the  process  of  recomputing 
these  cells  can  begin. 
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To  evaluate  a  cell,  designated  the  original  cell,  each  argument  in  its 
formula,  the  original  formula,  must  be  evaluated.  This  evaluation  is  in 
general  a  recursive  procedure.  By  drawing  a  directed  edge  from  the  original 
cell  to  each  cell  referenced  in  the  formula  (one  edge  per  cell  in  the  formula), 
we  get  a  rooted  tree  that  is  rooted  at  the  original  cell.  By  applying  this 
algorithm  recursively  on  each  cell  in  the  formula,  we  get  several  paths,  each 
of  which  ends  at  a  leaf  having  a  constant-formula  cell.  The  resulting  rooted 
call  tree  is  just  a  subset  of  the  spreadsheet  dependency  graph.  We  may 
then  start  at  the  leaves  of  the  tree  to  construct  a  string  of  formulas  of  the 
cells  in  a  given  path  in  the  direction  back  toward  the  root.  Each  string 
is  a  self-contained  sequence  of  formulas  that  allows  the  original  cell  to  be 
evaluated. 

Lastly,  once  all  the  arguments  in  the  original  formula  have  been  evaluated, 
the  original  cell  can  be  evaluated.  Those  arguments  in  any  formula  that  are 
not  members  of  V  need  not  be  recomputed;  their  current  values  can  be  used 
instead. 

The  foregoing  procedure  is  an  implementation  of  the  depth-first  search 
(DFS)  algorithm  described  in  [3].  In  this  case,  the  use  of  the  terms  “prede¬ 
cessor”  and  “descendant”  in  [3]  is  consistent  with  our  usage.  However,  we 
do  not  use  “timestamping.” 

We  can  see  a  lot  in  common  here  with  the  Dependency  Set  Generation  al¬ 
gorithm.  Once  again,  we  see  a  recursive  procedure  being  described,  although 
it  is  not  as  easily  expressible  in  one  line  as  in  (2).  Instead,  we  will  codify  the 
algorithm  in  the  pseudocode  given  in  Figure  3. 

5.2.1  Example  (continued) 

Continuing  the  example  begun  in  the  Dependency  Set  Generation  section, 
consider  again  the  dependency  graph  in  Figure  1.  Suppose  that  El  is  to  be 
evaluated.  The  well-known  left-to-right,  post-order  tree  traversal  algorithm 
will  be  used  to  specify  the  order  of  evaluation  of  El’s  descendants.  During 
the  first  call  to  EvalCell,  the  recursion  depth  variable  depth  is  initialized  to 
0.  In  addition  to  the  ref  and  formula  helds,  assume  that  a  cell  object  also 
contains  a  held  nchildren  that  gives  the  number  of  children  in  its  formula 
held.  The  object  cell  also  contains  a  child  array,  each  of  whose  elements 
is  a  cell  representing  each  child  in  its  formula.  The  elements  of  child  are 
stored  in  order  of  occurrence  in  its  formula,  element  0  being  the  leftmost 
child,  and  element  nchildren- 1  being  the  rightmost. 
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Input :  Original  Cell 

Output :  Original  Cell  with  newly  computed  value 
depth  =0;  //  Initial  value  (global) 

EvalCell (cell) 

{ 

for  (i=0  through  cell .nchildren-1 ; i++)  { 
depth++ ; 

EvalCell (cell . child [i] ) ; 

} 

parser_str . append(cell . ref  +  ’=’  +  cell. formula  +  ’;’); 
if  (depth  ==  0)  { 

cell. value  =  parse (parser_str) ; 

} 

depth — ; 
return  cell; 


Figure  3:  Dependency  Set  Evaluation  Algorithm 


We  thus  begin  by  calling  EvalCell  with  cell  .ref  set  to  El.  The  recur¬ 
sive  evaluation  of  El’s  call  tree  begins  with  the  leftmost  cell  reference  in  El’s 
formula,  Bl.  B1  G  V,  and  therefore  must  be  recomputed.  We  then  begin 
with  the  leftmost  element  of  its  formula,  and  hnd  that  it  is  the  constant  1. 
This  is  a  constant,  and  thus  requires  no  further  evaluation;  we  move  to  the 
next  element,  Al.  Al  is  the  changed  cell,  and  its  value  is  known,  thus  no 
further  analysis  is  needed. 

We  have  now  reached  the  end  of  Bl’s  formula,  allowing  us  to  compute  its 
value.  We  thus  go  back  up  to  El  to  process  the  next  argument  in  its  formula. 
Cl.  Cl  G  V,  and  therefore  must  be  recomputed.  The  leftmost  child  of  Cl’s 
formula  is  Al,  and  has  already  been  evaluated.  Cl’s  next  child,  D1  ^  V,  and 
thus  does  not  have  to  be  recomputed.  Its  current  value  of  10  is  used. 

We  have  now  recursively  evaluated  all  of  the  children  of  El’s  formula,  and 
completed  the  building  of  parser_str.  This  string  may  then  be  passed  to  a 
parser  for  evaluation  of  El.  ExcelComp  uses  an  LALR  parser  developed  by 
the  author  using  the  tools  JLex  [1]  and  CUP  [14].  LALR  stands  for  LookA- 
head  Left-to-right  identifying  the  Rightmost  production,  and  is  described  in 
[12].  For  this  example,  the  string  is: 
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A1=2;B1=1+A1;A1=2;D1=10;C1=A1+D1;E1=B1+C1;  . 


Here,  “=”  stands  for  assignment,  and  delimits  each  assignment.  It 
is  assumed  that  the  parser  stores  values  via  the  assignment  statements,  and 
that  these  values  may  be  retrieved  at  points  later  in  the  parse  string.  The 
history  of  the  construction  of  the  parser  string  is  summarized  in  Table  1. 


depth 

ceN.ref 

i 

parser_str 

0 

El 

0 

1 

B1 

0 

2 

A1 

0 

Al=2; 

1 

B1 

0 

A1=2;B1=1+A1; 

0 

El 

1 

A1=2;B1=1+A1; 

1 

Cl 

0 

A1=2;B1=1+A1; 

2 

A1 

0 

A1=2;B1=1+A1;A1=2; 

1 

Cl 

1 

A1=2;B1=1+A1;A1=2; 

2 

D1 

0 

A1=2;B1=1+A1;A1=2;D1=10; 

1 

Cl 

1 

A1=2;B1=1+A1;A1=2;D1=10;C1=A1+D1; 

0 

El 

1 

A1=2;B1=1+A1;A1=2;D1=10;C1=A1+D1; 

E1=B1+C1; 

Table  1:  History  of  Parser  String  Construction  for  El 

Note  that  there  is  a  redundant  assignment  “Al=2”  in  parser_str.  This  is 
an  example  suggesting  efficiency  enhancements  that  can  be  seen  to  improve 
the  performance  of  these  algorithms.  Improvements  include,  but  are  not 
necessarily  limited  to: 

1.  Recompute  a  cell  value  only  once. 

2.  Do  not  recursively  evaluate  cells  that  are  not  in  V. 

Although  these  improvements  surely  are  desirable  for  minimizing  the 
number  of  algorithm  steps,  experience  with  their  use  in  ExcelComp  revealed 
that  the  time  to  run  the  additional  code  required  to  implement  these  im¬ 
provements  largely  cancels  out  the  benehts  of  fewer  cell  evaluations. 

More  ideas  concerning  the  speed-up  of  interpreted  mode  are  given  in  the 
Algorithm  Complexity  section. 
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6  Compiled  mode 

While  the  compiled  mode  agrees  with  interpreted  mode  with  respect  to  the 
preservation  of  spreadsheet  integrity,  it  uses  a  clever  postponement  of  compu¬ 
tation  technique  to  update  a  dependent  cell’s  value  just  prior  to  the  reading 
of  its  value  via  its  accessor  method.  Using  this  deferred  recomputation  strat¬ 
egy,  the  execution  time  associated  with  recomputing  those  dependent  cells 
whose  values  are  never  accessed  is  eliminated.  This  deferred  recomputation 
is  similar  to  Excel’s  manual  calculation  mode,  where  the  user  specihes  when 
recomputation  is  to  occur  thus  deferring  immediate  recomputation.  It  is  also 
similar  to  the  mark-sweep  garbage  collection  algorithm  [29]. 

Compiled  mode  is  implemented  by  ExcelComp’s  use  of  the  Cell  API 
[27].  The  highlight  of  this  mode  is  preserving  spreadsheet  integrity  while 
improving  the  running  time  performance  of  ExcelComp.  Several  techniques 
are  used  to  meet  this  goal,  including  the  use  of: 

1.  the  Java  Reflection  API 

2.  Hash  containers,  and 

3.  Deferred  recomputation  of  dependent  cells. 

6.1  The  Cell  Class 

Cell  is  an  abstract  Java  base  class  that  provides  a  framework  for  modeling 
the  cells  of  a  spreadsheet,  each  of  which  is  represented  by  a  class  derived 
from  Cell  named  CellXi,  where  Xi  denotes  the  Al-style  reference  to  its 
corresponding  cell. 

On  its  initial  invocation.  Cell’s  accessor  class  method  getCell  instanti¬ 
ates  a  CellXi  object  via  the  Java  Reflection  API  [25].  This  technique  allows 
a  Java  class  to  instantiate  another  class  whose  name  is  created  at  run  time 
by  the  calling  method.  To  improve  the  efficiency  of  subsequent  accesses. 
Cell  has  a  class  variable  workbook  to  reference  a  HashMap  of  references  to 
previously  instantiated  CellXi  objects.  Similarly,  each  CellXi  object  has 
a  HashSet  named  dependencies  that  contains  references  to  CellXi  objects 
that  correspond  to  cells  that  are  direct  dependents  (parents)  of  the  CellXi 
object  that  owns  dependencies. 

In  a  CellXi’s  constructor,  each  child’s  instance  method  addDependency  is 
called  to  add  Xi  to  that  child’s  dependencies  set.  The  use  of  getCell  during 
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Figure  4:  Preprocessing  requirements  for  ExcelComp  modes 


this  process  causes  each  child’s  object,  on  its  first  access,  to  be  initialized  with 
its  constructor.  A  DFS  traversal  of  CellXi’s  call  tree  thus  occurs  so  that  each 
traversed  parent  appears  in  each  of  its  children’s  dependencies  set.  The  set 
of  all  such  parents  is  the  set  of  discovered  ancestors.  Both  workbook  and 
dependencies  contain  only  minimal  subsets  of  the  full  set  of  their  respective 
data  that  describes  the  entire  spreadsheet  as  determined  by  the  history  of 
Cell  method  calls.  These  subsets  are  updated  as  necessary,  and  suffice  for 
computing  correct  results  when  recomputation  of  cell  values  is  necessary. 

6.2  Preprocessing 

Figure  4  details  the  differences  in  the  preprocessing  requirements  for  the  two 
modes  of  ExcelComp. 

The  preprocessing  necessary  for  compiled  mode  begins  with  the  same 
XML  input  hie  used  in  interpreted  mode.  A  Java  API  named  GenCell  [10] 
is  run  on  this  input  hie  to  produce  a  set  of  Java  source  hies.  Each  source 
hie  dehnes  a  CellXi  class  that  corresponds  to  a  spreadsheet  cell.  GenCell 
uses  a  JLex/CUP-based  parser  similar  to  that  used  in  interpreted  mode  to 
allow  the  translation  of  a  supported  subset  of  the  Excel  formula  language 
into  the  appropriate  Java  statements.  The  important  diherence  between  the 
two  parsers  is  in  their  output.  In  interpreted  mode,  the  ExcelComp  parser 
returns  a  string  at  run  time  that  represents  a  newly-computed  value.  For 
ExcelComp’s  compiled  mode,  the  GenCell  parser  returns  Java  source  code 
that  is  written  to  a  set  of  source  hies  as  an  offiine  preprocessing  task.  Once 
GenCell  has  completed  generating  all  of  the  source  hies,  the  Java  compiler  is 
run  to  compile  these  hies  into  class  hies  of  executable  bytecode.  The  Excel¬ 
Comp  user  must  then  ensure  that  the  Java  classpath  contains  the  appropriate 
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references  to  allow  the  client  application  to  find  the  classes  at  run  time. 

6.3  Dependency  Set  Generation 

The  nature  of  the  preprocessing  performed  in  compiled  mode  allows  Excel- 
Comp  to  handle  Dependency  Set  Generation  as  a  distributed,  on-demand 
task  rather  than  as  an  explicit  set  of  steps  conducted  immediately  after  a 
cell  value  is  changed  as  in  interpreted  mode. 

6.4  Deferred  Recomputation  of  Dependent  Cells 

Rather  than  requiring  that  the  immediate  recomputation  of  all  members  of 
V  occur  after  a  cell’s  value  has  been  changed,  compiled  mode  defers  the 
recomputation  of  any  member  of  T)  until  the  client  programmer  requests  its 
value  via  an  ExcelComp  accessor  method.  This  algorithm  saves  considerable 
time  when  compared  to  its  interpreted  mode  counterpart,  seeing  that  the 
recomputation  of  many  dependencies  whose  value  is  never  sought  is  avoided. 

6.5  Getting  a  Value 

Each  CellXi  object’s  getValue  method  contains  the  Java  encoding  of  the 
formula  for  cell  Xi.  The  value  of  the  cell  is  recomputed  and  stored  as  the 
instance  variable  val  only  when  that  object’s  boolean  dirty  flag  is  true; 
this  flag  thus  allows  this  method  to  avoid  unnecessary  recomputation.  When 
recomputation  is  unnecessary,  getValue  just  returns  the  value  of  val. 

6.6  Setting  a  Value 

When  a  CellXi  object’s  setValue  method  is  called,  its  instance  variable 
val  is  set  to  the  desired  value,  and  a  DFS  traversal  of  all  of  Xi’s  discovered 
ancestors  is  performed  to  ensure  that  each  such  ancestor’s  dirty  flag  is  set  to 
true.  This  ensures  that  all  ancestors’  values  are  recomputed  on  a  subsequent 
call  to  an  ancestor’s  getValue  method.  Note  that  recomputations  are  only 
done  if  an  ancestor’s  getValue  method  is  called,  thus  saving  the  time  of 
recomputing  ancestor  values  that  may  never  be  accessed. 
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6.7  Example 

The  following  example  illustrates  the  workings  of  compiled  mode  using  Fig¬ 
ure  1.  Given  the  spreadsheet  represented  by  this  graph,  we  will  use  compiled 
mode  to  set  Al’s  value  to  2,  and  then  get  the  value  of  El.  Observe  that  El’s 
value  with  Al  set  to  1  is  13.  By  changing  Al’s  value  to  2,  we  expect  the  new 
value  of  El  to  be  15. 

The  first  statement  given  by  the  client  program  is: 

Cell . getCell ( " Al " ) . setValue (2) ; 

The  initial  part  of  this  statement,  Cell.getCellC'Al"),  creates  a  new  in¬ 
stance  of  CellAl,  since  one  does  not  yet  exist.  This  new  instance  is  cre¬ 
ated  using  the  forName  class  method  in  the  package  java. lang. Class  [19]. 
CellAl’s  base  class  constructor  CellO  is  first  called  to  set  CellAl’s  dirty 
flag  to  true.  CellAl’s  val  variable  is  set  to  1  by  its  constructor.  The  last 
part  of  this  statement  calls  Cell’s  setValue  method  to  set  val  to  2.  In 
general,  setValue  recursively  marks  all  of  Al’s  discovered  ancestors  as  dirty. 
However,  since  no  ancestors  have  yet  been  discovered,  and  thus  no  corre¬ 
sponding  CellXi  objects  have  yet  been  instantiated,  no  such  marking  occurs 
here.  We  are  thus  left  with  one  instance  of  CellAl  that  is  marked  as  dirty 
and  has  a  value  of  2. 

We  now  get  the  value  of  El.  The  appropriate  statement  is: 

Cell.getCellC'El")  .getValueO  ; 

The  hrst  part  of  the  statement  behaves  in  the  same  way  as  for  Al  described 
above,  only  now  the  newly  created  object  is  an  instance  of  CellEl.  In  addi¬ 
tion,  CellEl’s  constructor  initiates  a  DFS  traversal  of  all  of  El’s  descendants 
to  update  each  descendant’s  HashSet  dependencies.  For  this  example,  both 
CellBl’s  and  CellCl’s  dependencies  sets  are  updated  by  having  a  reference 
to  CellEl  added.  Note  that  CellCl’s  dependencies  set  does  not  refer  to 
CellFl  since,  although  FI  is  dependent  on  Cl,  it  is  not  a  descendant  of 
El.  Such  a  reference  to  CellFl  need  only  be  added  if  FI  is  the  subject  of 
future  method  calls.  The  last  part  of  this  statement  checks  to  see  whether 
the  instance  of  CellEl  has  been  marked  as  dirty.  Since  CellEl  was  just 
constructed  anew,  it  is  marked  as  dirty  and  thus  its  value  must  be  recom¬ 
puted  (in  this  case,  computed  for  the  first  time).  The  statement  in  CellEl’s 
getValue  method  that  accomplishes  this  is: 
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this.val  =  Cell.getCellC'Bl")  .getValueO  + 
Cell.getCellC'Cl")  .getValueO  ; 

Since  the  objects  for  Bl,  Cl,  and  D1  were  all  marked  as  dirty  during  the 
DFS  traversal  in  CellEl’s  constructor,  each  object’s  getValue  method  will 
recompute  the  value  for  that  cell.  Each  object’s  getValue  method  resets  that 
object’s  dirty  flag  to  false  after  the  recomputation.  Subsequent  accesses 
to  CellXi  values  that  have  not  been  affected  by  a  change  to  a  descendant’s 
value  simply  return  the  value  stored  in  val  with  no  recomputation  necessary. 


7  Performance 

7.1  Tests 

ExcelComp  was  tested  for  its  running  time  performance  in  the  SATCOM 
Availability  Analyst  (SA2)  Java  application  [7].  The  addition  of  ExcelComp 
API  calls  to  SA2’s  map  display  function  was  chosen  for  test  due  to  its  de¬ 
manding  requirement  that  8,518  data  points  be  updated  for  an  on-screen 
Mercator  map  in  such  a  way  that  the  user  is  not  burdened  by  long  wait 
times  for  a  complete  update  of  the  map.  Processing  each  of  the  data  points 
required  3  calls  to  ExcelComp  methods;  2  of  these  calls  each  changed  a  cell 
value  from  the  input  spreadsheet,  and  the  last  call  read  back  a  cell  value 
of  interest  from  the  newly  updated  spreadsheet.  The  map  display  function 
was  selected  and  run  10-|-  times  in  each  mode  to  characterize  ExcelComp’s 
performance.  Running  times  associated  with  the  first  invocation  of  the  map 
function  were  greater  than  subsequent  trials,  and  thus  were  considered  out¬ 
liers  and  removed  from  the  representative  data.  These  larger  values  probably 
reflect  JVM-related  setup  steps  that  are  not  required  on  subsequent  trials. 

The  tests  were  conducted  on  a  Hewlett-Packard  HP  OmniBook  4150  B 
running  under  Microsoft  Windows  98  on  a  Pentium  HI  650  MHz  processor. 
SA2  was  run  using  the  Sun  Microsystems  JVM  version  1.3.  Running  times 
were  computed  as  the  difference  in  the  start  and  end  times  returned  by  the 
Java  method  System. currentTimeMillis ()  [23]. 

The  results  of  the  performance  tests  are  summarized  in  Table  2.  The 
sample  standard  deviation  is  computed  as  the  positive  square  root  of  the 
unbiased  sample  variance.  The  large  difference  in  performance  between  the 
modes  highlights  how  compiled  mode  can  provide  a  very  acceptable  perfor¬ 
mance  level  in  a  case  where  interpreted  mode,  requiring  over  5  minutes  to 
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Compiled  Mode 

Interpreted  Mode 

Average 

677 

347057 

Sample  Standard  Deviation 

116 

254 

Table  2:  ExcelComp  Running  Time  (milliseconds)  Performance  over  10  Sam¬ 
ples 

complete,  would  be  unacceptably  slow.  In  this  particular  case,  it  is  essential 
that  compiled  mode  be  chosen  to  make  the  use  of  ExcelComp  feasible. 

7.2  Algorithm  Complexity 

The  graph  traversal  algorithms  that  underlie  ExcelComp  are  well  known  to 
be  efficient.  Both  DES  and  BFS  have  running  times  that  are  linear  in  the 
size  of  the  graph’s  adjacency  list.  Specihcally,  BFS  is  0(y  +  E),  and  DES  is 
e{V  +  E)  [3,  sect.  22.2,  22.3]. 

Interpreted  mode  does  not  construct  an  adjacency  list,  and  could  very  well 
beneht  from  a  redesign  to  create  this  list  upon  the  loading  of  the  spreadsheet 
in  the  ExcelComp  constructor.  Because  this  construction  must  take  place 
at  run  time,  the  user  would  incur  a  one-time  performance  penalty  for  this 
initialization  step.  The  absence  of  an  adjacency  list  suggests  that  interpreted 
mode’s  running  time  is  probably  greater  than  the  linear  time  cited  above. 

Compiled  mode,  on  the  other  hand,  does  use  a  variation  of  adjacency 
lists  in  CellXi’s  dependencies  set.  However,  while  an  adjacency  list  stores 
references  to  children,  dependencies  stores  references  to  parents  of  CellXi. 
Compiled  mode  incurs  a  setup  penalty  during  the  discovery  of  cells,  but 
subsequent  accesses  to  CellXi  objects  are  more  efficient  through  the  use  of 
dependencies  and  workbook.  Its  use  of  adjacency  lists,  DES  traversals,  and 
the  efficient  Java  collections  framework  suggests  that  compiled  mode  has  a 
running  time  that  is  close  to  0(R  +  E). 


8  Conclusion 

We  have  seen  that  a  graph  representation  of  spreadsheet  cell  dependencies 
provides  insight  into  the  requirements  of  the  algorithms  used  for  the  au¬ 
tomatic  recomputation  of  dependent  cells.  Straightforward  implementation 
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of  well-known  graph  traversal  algorithms  suffices  for  correct  recomputation, 
however  the  adaptation  of  a  well-studied  garbage  collection  algorithm  along 
with  facilities  made  available  in  the  Java  language  enable  client  programs  to 
run  much  faster,  given  some  additional  preprocessing. 

The  client  programmer  should  choose  the  mode  of  ExcelComp  according 
to  an  analysis  of  the  application’s  run  time  requirements  and  the  tradeoffs 
between  the  modes  as  described  in  this  paper. 
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